Virti-Cue Social Modeling Tutorial Application: Evaluation Jeannette Jackson, Vanita Gupta, Chris Blais, Robin Halbert EDER 679.28 Dr. Michele Jacobsen University of Calgary
Virti-Cue Social Modeling Application: Evaluation
The evaluation phase of our product development is an integral part of the design process. It is intended to gather information about users’ experiences when interacting with our tutorial prototypes. The purpose of our evaluation was to assess the effectiveness of our learning application in meeting the needs of our learners and to "check that users can use the product and that they like it" (Preece, Rogers, & Sharp, 2007, p. 586). To accomplish this, we engaged in usability testing with a representative group of five users. As noted by Nielsen (2000), “the best results come from testing no more than five users.”
The decision to engage in parallel design, creating alternate designs at the same time (Nielsen, 2011), was made because of the inconclusive evidence regarding the effectiveness of animations as compared with equivalent static graphics in learning applications (Ayres, Kalyuga, Marcus, & Sweller, 2005; Ayres & Paas, 2007). These multiple design prototypes, a static version and a dynamic version of the tutorial, were considered and tested by our users. The intent of this testing was to determine the effectiveness of an animated tutorial as compared to an equivalent static graphical/textual tutorial in teaching users how to create a new story and add pictures. All components of the tutorial design were the same in both prototypes, except for the use of a voice over for guidance versus textual instructions, and animation versus still images. The animated tutorial also incorporated introductory and closing music tracks. The research question that we investigated is: Which presentation (dynamic or static) do users prefer in a tutorial application and what are the best features of each? (Mention usability goals here?? effective to use, efficient to use, easy to remember; user experience goals - satisfying, enjoyable, helpful, rewarding)
Evaluation Framework: DECIDE Jakob Nielsen (2006) warned about the disturbing trend wherein user testing is done only to educate the management. He stated, “Don't run your studies for the benefit of the people in the observation room. Test to discover the truth about the design, even when user tasks are boring to watch” (Nielsen, 2006). (I don't see the connection between the opening of this paragraph and what comes next ... help!) Addressing the need to tailor usability evaluation methods and to promote the effective use of the interface, we identified our ‘goal’ and certain measures encompassing our goal that guided us in our user-testing. As suggested by Preece, Sharp, and Rogers (2002), “goals and questions should guide all evaluation studies” (p. 360) in that they "provide a focus for observation, as the DECIDE framework points out.” Hence, we determined the goals of our evaluation based on the the DECIDE Framework, as outlined below:
1. Determine the goals and Explore the questions -- We will observe our users to determine which presentation (dynamic or static) they prefer in a tutorial applicationWe will also establish a second goal for the next usability evaluation round wherein we would like to measure ‘learning’ in the post tutorial and explore the questions in an informal interview asking the users for suggestions on the best features of each presentation. 2. Choose the paradigm and techniques -- We will conduct a parallel prototype testing and record data gathered through observations, informal interviews, and a questionnaire ( isn't the survey the questionnaire?). 3. Identify the practical issues -- We will endeavour to have our two prototypes and the online survey ready on the computers and ensure our users get sufficient time to complete both tutorials as well as the online survey. We will determine the roles within our group, including who will observe, record, and question our users. 4. Deal with ethical issues – Since our Online survey is voluntary and names are not required, we haven’t prepared a formal consent form, however if this survey was in the field we would ensure that consent forms had been filled out before conducting the survey. 5. Evaluate, analyze, and present the data -- Based on the feedback we receive we will evaluate and analyze our product and conduct pilot testing, prior to launching our product. While presenting the data we will collate informal observation and interview data and look for themes and issues ensuring the data presented is reliable and valid.
Evaluation Methodology: Research Foundations
As noted above, in our product evaluation we plan to use observation, informal interviews, and a questionnaire, which, according to Preece et al. (2002), "can be used on their own or in conjunction with other methods to clarify or deepen understandings" (p. 398). In the following paragraphs a summary of the research that informed our decisions regarding each of our evaluation methods is presented.
Observations In keeping with our evaluation goal, we focused on observing users to determine which tutorial presentation, dynamic or static, was preferred. As Preece et al. caution, “having a goal, even a very general goal, helps to guide the observation because there is always so much going on” (p. 361). Preece et al. (2002) present a number of guidelines for observing users that we employed during our usability testing. They recommend noting what is happening; what people are doing and saying; how they are behaving; as well as their tone and body language (Preece et al., 2002, p. 363). In recording our observations, we plan to work as a team, which has several benefits such as allowing us to distribute the workload, focus on different contexts, compare observations, and generate more reliable data (Preece et al., 2002). Having learned from our experiences in observing users during our interaction design, we decided to record observations on a blank sheet, rather than attempting to fill in pre-determined categories. As well, we will engage in member checks, (i.e. confirming our interpretations with our users) to ensure that we are making good interpretations and to improve the ‘validity’ of the evaluation process (Creswell, 2007). As we record our observations, we will endeavor to separate personal opinion from what actually happens, as recommended by Preece et al. (2002). Our observation notes will be reviewed “as soon as possible after each evaluation session to flesh out detail and check ambiguities with other observers” (Preece et al., 2002, p. 369) or our users.
Informal Open-Ended Interview
As noted by Preece et al. (2002), "interviews can be thought of as a 'conversation with a purpose'" (citing Kahn & Cannell, p. 390), and that was our intent in engaging in an open-ended interview. Given that our evaluation goal was to gain "first impressions about how users react" to our parallel designs, and to "explore users' general attitudes" (Nielsen, 2010, para. 16), an informal, open-ended interview best met our needs and served to provide a rich source of data (Preece et al., 2002, p. 390). Following the advice given by Preece et al. (citing Robson), we planned interview questions that weren't too long; avoided compound questions, jargon, and leading questions; and reflected sensitivity to our own biases by striving for neutrality in our questions. Our interview agenda supported our study goals, as identified in the DECIDE framework. Our intent was to learn about our users' reactions to the parallel designs of the tutorial, therefore our agenda included questions that allowed users to respond openly and freely regarding their impressions of the two formats. Open ended questions such as, "What did you think of this format?" and "How did this format compare to the other?" will be used to begin the conversation. As Preece et al. (2002) recommend, we will be "prepared to follow new lines of enquiry that contribute to (y)our agenda" (p. 392) as they arise. As noted below in the 'Questionnaire' section, we were aware of "the query effect" (Nielsen, 2010, para. 19) and worked to ensure that the questions asked related directly to our evaluation goals.
Questionnaire.
Preece et al. (2002) offer a number of guidelines for designing questionnaires, many of which we have adopted, including:
avoiding complex multiple questions
making questions clear and specific
beginning with general questions, followed by specific questions
providing clear instructions
using consistent rating scales
aiming for brevity over length to encourage completion (p. 400).
In addition, Nielsen (2004) suggests that in order to ensure high response rates and avoid misleading results, you should "keep your surveys short and ensure that your questions are well written and easy to answer" (summary). He adds, "the highest response rates come when surveys are quick and painless. And the best way to reduce users' time and suffering is to reduce the number of questions" (Nielsen, 2004, para. 3). The most influential of Nielsen's recommendations in our survey design was his advice regarding "survey bloat": "Please resist the temptation to collect all the information that anybody could ever want. You will end up with no information (or misleading information) instead" (para. 6). Related to this is the "Query Effect", also referred to by Nielsen (2010), which cautions "whenever you do ask users for their opinions, watch out for the query effect: People can make up an opinion about anything, and they'll do so if asked". The lesson for our group was to be careful in what we asked, making sure it is information that we wanted to have and that mattered in our design. Therefore, questions regarding the overall product design were not included and our focus remained on questions that served to address our research question.
Evaluation Methodology: Process In the testing, we alternated which of the two tutorial design models was accessed first by each of the users. After viewing the tutorials, we conducted an informal open-ended interview to gage user reactions, and also administered a follow-up questionnaire, posing open-ended questions and rating questions regarding the effectiveness of the tutorial in explaining the navigation, ease of use of the tutorial application, plus a segment to compare the two tutorial formats. Specific questions concerning design choices were included, such as layout, icon clarity, voice-over clarity, and instructional vocabulary. During the usability testing, we kept observational notes and tracked users’ progress, as described above. Interviews and a questionnaire were used to gage users' reactions and gather related comments, as research supports such initiatives (U.S. Department of Health & Human Services, 2006, p. 190). The results of our usability testing are reported in the following paragraphs.
Usability Testing: Observations/Results/Data Analysis (Preece et al., 2007, p. 373 - starting point)
Ideas for opening paragraph:
Qualitative data that is interpreted and used to tell "the story" aboutwhatwas observed" (2002, p. 379). Qualitative data that is categorized using techniques such as content analysis. (Preece et al., 2002, p. 379) "Much of the power of analyzing descriptive data lies in being able to tell a convincing story, illustrated with powerful examples that help to confirm the main points and will be credible" (p. 380).
Observational notes summary.
Informal open-ended interview summary.
Survey Monkey questionnaire summary.
Usability Testing: Implications for Design (after we've tested, what does this mean for our design - what are we going to do with this information?)
Somewhere here I'd like to include what Nielsen talks about regarding merging the proposed parallel designs. Nielsen (1996) explains how, "these interfaces are then merged to a unified design that can be further refined through additional iterative design" and that "each merged design element can be the best of the parallel versions' corresponding elements or a synthesis of several parallel versions' elements." We will apply (these elements from we have learned through our testing), essentially creating a hybrid of our proposed formats. (here we will need to be specific on what our users said we should keep and throw away from each of the tutorial samples) Or Combine?? - Great idea - I think this would fit well in the section after we report the results of the testing - describing what happened, what we will change, how we will change it, how we took the best of both ... thoughts??
Conclusion
After the initial usability testing we collated and reviewed the data for design revisions as per iterative design. According to Nielsen, conducting both parallel and iterative design methods “maximizes your chances of hitting on something better” (Nielsen, 2011).
Thoughts about the second goal for next usability evaluation round (measuring learning post tutorial): (Maybe a sentence or two)
Preece Sharp, & Rogers (p. 646) talk about time and number in usability testing. “time and number are the two main measures used, in terms of the time it takes typical users to complete a task, such as finding a website, and the number of errors that participants make.” They list a number of qualitative performance measures: Time to complete a task, time to complete a task, number of errors per unit of time, number of navigations to online help or manuals, number of users making a particular error, number ousers completing a task successfully
It might be worthwhile to mention in our future actions that we propose to record the time it takes to apply what the user has learned through the tutorial. We could also specify a time frame away from the application, such as a week or two, then invite users to go through it again. This would be a great way to test some of our research on memory and what Mayes & Roberts say on the importance of providing visual information to users as “in episodes experienced by humans, visual information is usually most salient" (Mayes & Roberts, p. 1396).
References
Ayres, P., Kalyuga, S., Marcus, N., & Sweller, J. (2005). The conditions under which instructional animations may be effective. Paper presented at an International Workshop and Mini-conference, Open University of the Netherlands: Heerlen, The Netherlands. Retrieved from www.ou.nl/Docs/Expertise/OTEC/Nieuws/icleps%20conferentie/ Ayres.doc
Ayres, P., & Paas, F. (2007). Making instructional animations more effective: A cognitive load approach. Applied Cognitive Psychology, 21, 695-700. doi: 10.1002/acp.1343
Nielsen, J. (2010). Interviewing users. Jakob Nielsen's alertbox, July 26, 2010. Retrieved from http://www.useit.com/alertbox/interviews.html
Nielsen, J. (2004). Keep online surveys short. Jakob Nielsen's alertbox, February 2, 2004. Retrieved from http://www.useit.com/alertbox/20040202.html Nielsen, J. (2011). Parallel & iterative design + competitive testing = high usability. Jakob Nielsen’s alertbox, January 18, 2011. Retrieved from http://www.useit.com/alertbox/design-diversity-process.html Nielsen, J. (2006). User testing is not entertainment. Jakob Nielsen's alertbox, September 11, 2006. Retrieved from http://www.useit.com/alertbox/user-testing-showbiz.html Nielsen, J. (2000). Why you only need to test with 5 users. Jakob Nielsen’s alertbox, March 19, 2000. Retrieved from http://www.useit.com/alertbox/20000319.html
Preece, J., Rogers, Y., & Sharp, H. (2002). Interaction design: Beyond human-computer interaction (1st ed.). New York, NY: John Wiley & Sons.
Preece, J., Rogers, Y., & Sharp, H. (2007). Interaction design: Beyond human-computer interaction (2nd ed.). New York, NY: John Wiley & Sons.
Virti-Cue Social Modeling Tutorial Application: Evaluation
Jeannette Jackson, Vanita Gupta, Chris Blais, Robin Halbert
EDER 679.28
Dr. Michele Jacobsen
University of Calgary
Virti-Cue Social Modeling Application: Evaluation
The evaluation phase of our product development is an integral part of the design process. It is intended to gather information about users’ experiences when interacting with our tutorial prototypes. The purpose of our evaluation was to assess the effectiveness of our learning application in meeting the needs of our learners and to "check that users can use the product and that they like it" (Preece, Rogers, & Sharp, 2007, p. 586). To accomplish this, we engaged in usability testing with a representative group of five users. As noted by Nielsen (2000), “the best results come from testing no more than five users.”
The decision to engage in parallel design, creating alternate designs at the same time (Nielsen, 2011), was made because of the inconclusive evidence regarding the effectiveness of animations as compared with equivalent static graphics in learning applications (Ayres, Kalyuga, Marcus, & Sweller, 2005; Ayres & Paas, 2007). These multiple design prototypes, a static version and a dynamic version of the tutorial, were considered and tested by our users. The intent of this testing was to determine the effectiveness of an animated tutorial as compared to an equivalent static graphical/textual tutorial in teaching users how to create a new story and add pictures. All components of the tutorial design were the same in both prototypes, except for the use of a voice over for guidance versus textual instructions, and animation versus still images. The animated tutorial also incorporated introductory and closing music tracks. The research question that we investigated is: Which presentation (dynamic or static) do users prefer in a tutorial application and what are the best features of each? (Mention usability goals here?? effective to use, efficient to use, easy to remember; user experience goals - satisfying, enjoyable, helpful, rewarding)
Evaluation Framework: DECIDE
Jakob Nielsen (2006) warned about the disturbing trend wherein user testing is done only to educate the management. He stated, “Don't run your studies for the benefit of the people in the observation room. Test to discover the truth about the design, even when user tasks are boring to watch” (Nielsen, 2006). (I don't see the connection between the opening of this paragraph and what comes next ... help!)
Addressing the need to tailor usability evaluation methods and to promote the effective use of the interface, we identified our ‘goal’ and certain measures encompassing our goal that guided us in our user-testing. As suggested by Preece, Sharp, and Rogers (2002), “goals and questions should guide all evaluation studies” (p. 360) in that they "provide a focus for observation, as the DECIDE framework points out.” Hence, we determined the goals of our evaluation based on the the DECIDE Framework, as outlined below:
1. Determine the goals and Explore the questions -- We will observe our users to determine which presentation (dynamic or static) they prefer in a tutorial application We will also establish a second goal for the next usability evaluation round wherein we would like to measure ‘learning’ in the post tutorial and explore the questions in an informal interview asking the users for suggestions on the best features of each presentation.
2. Choose the paradigm and techniques -- We will conduct a parallel prototype testing and record data gathered through observations, informal interviews, and a questionnaire ( isn't the survey the questionnaire?).
3. Identify the practical issues -- We will endeavour to have our two prototypes and the online survey ready on the computers and ensure our users get sufficient time to complete both tutorials as well as the online survey. We will determine the roles within our group, including who will observe, record, and question our users.
4. Deal with ethical issues – Since our Online survey is voluntary and names are not required, we haven’t prepared a formal consent form, however if this survey was in the field we would ensure that consent forms had been filled out before conducting the survey.
5. Evaluate, analyze, and present the data -- Based on the feedback we receive we will evaluate and analyze our product and conduct pilot testing, prior to launching our product. While presenting the data we will collate informal observation and interview data and look for themes and issues ensuring the data presented is reliable and valid.
Evaluation Methodology: Research Foundations
As noted above, in our product evaluation we plan to use observation, informal interviews, and a questionnaire, which, according to Preece et al. (2002), "can be used on their own or in conjunction with other methods to clarify or deepen understandings" (p. 398). In the following paragraphs a summary of the research that informed our decisions regarding each of our evaluation methods is presented.
Observations
In keeping with our evaluation goal, we focused on observing users to determine which tutorial presentation, dynamic or static, was preferred. As Preece et al. caution, “having a goal, even a very general goal, helps to guide the observation because there is always so much going on” (p. 361). Preece et al. (2002) present a number of guidelines for observing users that we employed during our usability testing. They recommend noting what is happening; what people are doing and saying; how they are behaving; as well as their tone and body language (Preece et al., 2002, p. 363). In recording our observations, we plan to work as a team, which has several benefits such as allowing us to distribute the workload, focus on different contexts, compare observations, and generate more reliable data (Preece et al., 2002). Having learned from our experiences in observing users during our interaction design, we decided to record observations on a blank sheet, rather than attempting to fill in pre-determined categories. As well, we will engage in member checks, (i.e. confirming our interpretations with our users) to ensure that we are making good interpretations and to improve the ‘validity’ of the evaluation process (Creswell, 2007). As we record our observations, we will endeavor to separate personal opinion from what actually happens, as recommended by Preece et al. (2002). Our observation notes will be reviewed “as soon as possible after each evaluation session to flesh out detail and check ambiguities with other observers” (Preece et al., 2002, p. 369) or our users.
Informal Open-Ended Interview
As noted by Preece et al. (2002), "interviews can be thought of as a 'conversation with a purpose'" (citing Kahn & Cannell, p. 390), and that was our intent in engaging in an open-ended interview. Given that our evaluation goal was to gain "first impressions about how users react" to our parallel designs, and to "explore users' general attitudes" (Nielsen, 2010, para. 16), an informal, open-ended interview best met our needs and served to provide a rich source of data (Preece et al., 2002, p. 390). Following the advice given by Preece et al. (citing Robson), we planned interview questions that weren't too long; avoided compound questions, jargon, and leading questions; and reflected sensitivity to our own biases by striving for neutrality in our questions. Our interview agenda supported our study goals, as identified in the DECIDE framework. Our intent was to learn about our users' reactions to the parallel designs of the tutorial, therefore our agenda included questions that allowed users to respond openly and freely regarding their impressions of the two formats. Open ended questions such as, "What did you think of this format?" and "How did this format compare to the other?" will be used to begin the conversation. As Preece et al. (2002) recommend, we will be "prepared to follow new lines of enquiry that contribute to (y)our agenda" (p. 392) as they arise. As noted below in the 'Questionnaire' section, we were aware of "the query effect" (Nielsen, 2010, para. 19) and worked to ensure that the questions asked related directly to our evaluation goals.
Questionnaire.
Preece et al. (2002) offer a number of guidelines for designing questionnaires, many of which we have adopted, including:
In addition, Nielsen (2004) suggests that in order to ensure high response rates and avoid misleading results, you should "keep your surveys short and ensure that your questions are well written and easy to answer" (summary). He adds, "the highest response rates come when surveys are quick and painless. And the best way to reduce users' time and suffering is to reduce the number of questions" (Nielsen, 2004, para. 3). The most influential of Nielsen's recommendations in our survey design was his advice regarding "survey bloat": "Please resist the temptation to collect all the information that anybody could ever want. You will end up with no information (or misleading information) instead" (para. 6). Related to this is the "Query Effect", also referred to by Nielsen (2010), which cautions "whenever you do ask users for their opinions, watch out for the query effect: People can make up an opinion about anything, and they'll do so if asked". The lesson for our group was to be careful in what we asked, making sure it is information that we wanted to have and that mattered in our design. Therefore, questions regarding the overall product design were not included and our focus remained on questions that served to address our research question.
Evaluation Methodology: Process
In the testing, we alternated which of the two tutorial design models was accessed first by each of the users. After viewing the tutorials, we conducted an informal open-ended interview to gage user reactions, and also administered a follow-up questionnaire, posing open-ended questions and rating questions regarding the effectiveness of the tutorial in explaining the navigation, ease of use of the tutorial application, plus a segment to compare the two tutorial formats. Specific questions concerning design choices were included, such as layout, icon clarity, voice-over clarity, and instructional vocabulary. During the usability testing, we kept observational notes and tracked users’ progress, as described above. Interviews and a questionnaire were used to gage users' reactions and gather related comments, as research supports such initiatives (U.S. Department of Health & Human Services, 2006, p. 190). The results of our usability testing are reported in the following paragraphs.
Usability Testing: Observations/Results/Data Analysis (Preece et al., 2007, p. 373 - starting point)
Ideas for opening paragraph:
Qualitative data that is interpreted and used to tell "the story" about what was observed" (2002, p. 379).
Qualitative data that is categorized using techniques such as content analysis. (Preece et al., 2002, p. 379)
"Much of the power of analyzing descriptive data lies in being able to tell a convincing story, illustrated with powerful examples that help to confirm the main points and will be credible" (p. 380).
Observational notes summary.
Informal open-ended interview summary.
Survey Monkey questionnaire summary.
Usability Testing: Implications for Design (after we've tested, what does this mean for our design - what are we going to do with this information?)
Somewhere here I'd like to include what Nielsen talks about regarding merging the proposed parallel designs. Nielsen (1996) explains how, "these interfaces are then merged to a unified design that can be further refined through additional iterative design" and that "each merged design element can be the best of the parallel versions' corresponding elements or a synthesis of several parallel versions' elements." We will apply (these elements from we have learned through our testing), essentially creating a hybrid of our proposed formats. (here we will need to be specific on what our users said we should keep and throw away from each of the tutorial samples) Or Combine?? - Great idea - I think this would fit well in the section after we report the results of the testing - describing what happened, what we will change, how we will change it, how we took the best of both ... thoughts??
Conclusion
After the initial usability testing we collated and reviewed the data for design revisions as per iterative design. According to Nielsen, conducting both parallel and iterative design methods “maximizes your chances of hitting on something better” (Nielsen, 2011).
Thoughts about the second goal for next usability evaluation round (measuring learning post tutorial): (Maybe a sentence or two)
Preece Sharp, & Rogers (p. 646) talk about time and number in usability testing. “time and number are the two main measures used, in terms of the time it takes typical users to complete a task, such as finding a website, and the number of errors that participants make.” They list a number of qualitative performance measures: Time to complete a task, time to complete a task, number of errors per unit of time, number of navigations to online help or manuals, number of users making a particular error, number ousers completing a task successfully
It might be worthwhile to mention in our future actions that we propose to record the time it takes to apply what the user has learned through the tutorial. We could also specify a time frame away from the application, such as a week or two, then invite users to go through it again. This would be a great way to test some of our research on memory and what Mayes & Roberts say on the importance of providing visual information to users as “in episodes experienced by humans, visual information is usually most salient" (Mayes & Roberts, p. 1396).
References
Ayres, P., Kalyuga, S., Marcus, N., & Sweller, J. (2005). The conditions under which instructional animations may be effective. Paper presented at an International Workshop and Mini-conference, Open University of the Netherlands: Heerlen, The Netherlands. Retrieved from www.ou.nl/Docs/Expertise/OTEC/Nieuws/icleps%20conferentie/ Ayres.doc
Ayres, P., & Paas, F. (2007). Making instructional animations more effective: A cognitive load approach. Applied Cognitive Psychology, 21, 695-700. doi: 10.1002/acp.1343
Nielsen, J. (2010). Interviewing users. Jakob Nielsen's alertbox, July 26, 2010. Retrieved from http://www.useit.com/alertbox/interviews.html
Nielsen, J. (2004). Keep online surveys short. Jakob Nielsen's alertbox, February 2, 2004. Retrieved from http://www.useit.com/alertbox/20040202.html
Nielsen, J. (2011). Parallel & iterative design + competitive testing = high usability. Jakob Nielsen’s alertbox, January 18, 2011. Retrieved from http://www.useit.com/alertbox/design-diversity-process.html
Nielsen, J. (2006). User testing is not entertainment. Jakob Nielsen's alertbox, September 11, 2006. Retrieved from http://www.useit.com/alertbox/user-testing-showbiz.html
Nielsen, J. (2000). Why you only need to test with 5 users. Jakob Nielsen’s alertbox, March 19, 2000. Retrieved from http://www.useit.com/alertbox/20000319.html
Preece, J., Rogers, Y., & Sharp, H. (2002). Interaction design: Beyond human-computer interaction (1st ed.). New York, NY: John Wiley & Sons.
Preece, J., Rogers, Y., & Sharp, H. (2007). Interaction design: Beyond human-computer interaction (2nd ed.). New York, NY: John Wiley & Sons.
U.S. Department of Health & Human Services. (2006). Research-based web design & usability guidelines. Retrieved from
http://www.usability.gov/guidelines/index.html